Created: 2026-03-06 07:53:04
Updated: 2026-03-06 07:53:04

AEP for continuous random variables:
X1,,XnX_{1},\dots,X_{n}是独立同分布变量,分布函数f(x)f(x),则

1nlogf(X1,,Xn)E[logf(X)]=h(X) in probability-\frac{1}{n} \log f(X_{1},\dots,X_{n}) \to E[-\log f(X)] = h(X)\ \text{in probability}

typical set:

Aϵ(n)={(x1,,xn)Sn:1nlogf(x1,,xn)h(X)ϵ}A_{\epsilon}^{(n)} = \left\{(x_{1},\dots,x_{n})\in S^n:| -\frac{1}{n}\log f(x_{1},\dots,x_{n}) - h(X)| \leq\epsilon\right\}


Vol(A)=Adx1dx2dxn,ARnVol(A) = \int _{A}dx_{1}dx_{2}\dots dx_{n}, A\in \mathscr{R}^n

Aϵ(n)A_{\epsilon}^{(n)}满足如下性质:

  1. Pr(Aϵ(n))1ϵ\text{Pr}(A_{\epsilon}^{(n)})\geq 1-\epsilon for n sufficiently large
  2. Vol(Aϵ(n)2n(h(X)+ϵ)Vol(A_{\epsilon}^{(n)} \leq 2^{n(h(X)+\epsilon)} for all n
  3. Vol(Aϵ(n))(1ϵ)2n(h(X)ϵ)Vol(A_{\epsilon}^{(n)}) \geq (1-\epsilon)2^{n(h(X)-\epsilon)} for n sufficiently large

Theorem 9.3.1: If the density f(x)f(x) of the random variable XX is Riemann integrable, then

H(XΔ)+log(Δ)h(f)=h(X),asΔ0H(X^{\Delta}) + \log(\Delta) \to h(f) = h(X),\qquad as \Delta\to0

Thus the entropy of an nn-bit quantization of a continuous random variable XX is approximately h(X)+nh(X)+n

Conditional differential entropy:

h(XY)=f(x,y)logf(xy)dxdyh(X\mid Y) = -\int f(x,y)\log f(x|y) \, dxdy

h(XY)=h(X,Y)h(Y)h(X|Y) = h(X,Y) -h(Y)


多元正太分布的微分熵为:

h(X1,,Xn)=h(Nn(μ,K))=12log(2πe)nK  bitsh(X_{1},\dots,X_{n}) = h(\mathscr{N}_{n}(\mu,K)) = \frac{1}{2} \log (2\pi e)^n |K|\ \ bits

Relative Entropy:

D(fg)=flogfgdxD(f\mid \mid g) = \int f\log \frac{f}{g} \, dx

Motivated by continuity, we set 0log00=00\log \frac{0}{0}=0


Mutual information:

I(X;Y)=f(x,y)logf(x,y)f(x)f(y)dxdy=D(f(x,y)f(x)f(y))I(X;Y) = \int f(x,y) \log \frac{f(x,y)}{f(x)}f(y) dxdy = D(f(x,y) \mid\mid f(x)f(y))

I(XΔ;YΔ)=I(X,Y)I(X^\Delta;Y^\Delta) = I(X,Y)

Hadamard's inequality:
If we let XN(0,K)\vec{X}\sim \mathscr{N}(0,K) be a multi-variate normal random variable, then substituting the definitions of entropy in the above inequality gives us

Ki=1nKii|K| \leq \prod_{i=1}^n K_{ii}

微分熵的性质:

  1. h(X+c)=h(X)h(X+c)=h(X)
  2. h(aX)=h(X)+logah(aX) =h(X)+\log|a|
  3. h(AX)=h(X)+logAh(A \vec{X})=h(\vec{X})+\log |A|, |A|是行列式的绝对值
  4. 令随机向量XRn\vec{X}\in \mathbb{R}^n具有均值0和协方差K=EXXtK=EXX^t, 例如Kij=EXiXjK_{ij}=EX_{i}X_{j},那么h(X)12log(2πe)nKh(\vec{X})\leq \frac{1}{2} \log (2\pi e)^n |K|,上式取等号当且仅当XN(0,K)\vec{X}\sim \mathscr{N}(0,K)
    证明:令g(X)g(\vec{X})具有满足g(x)xixjdx=Kij\int g(\vec{x})x_{i}x_{j} \, d \vec{x}=K_{ij}的任意概率密度,令ϕK\phi_{K}是满足正态分布N(0,K)\mathscr{N}(0,K)随机向量的概率分布,注意到logϕK(x)\log \phi_{K}(\vec{x})是一个二次型,而且xixjϕK(x)dx=Kij\int x_{i}x_{j}\phi_{K}(\vec{x}) \, d \vec{x}=K_{ij},于是

0D(gϕK)=glog(gϕK)=h(g)=glogϕK=h(g)ϕKlogϕK=h(g)+h(ϕK)\begin{align} 0 & \leq D(g\mid\mid \phi_{K}) \\ & = \int g\log\left( \frac{g}{\phi_{K}} \right) \\ & = -h(g) = \int g\log \phi_{K} \\ & = -h(g) - \int \phi_{K}\log \phi_{K} \\ & = -h(g)+h(\phi_{K}) \end{align}

在所有拥有相同方差的分布中,正太分布熵最大。我们用这个bound给出离散随机变量的熵。这不会用方差来描述,因为即便离散随机变量的方差任意小,它也可能有很大的熵。这个bound将由整数取值、拥有相同概率的随机变量来描述。

XX为取值在X={a1,a2,}\mathscr{X}=\{a_{1},a_{2},\dots\}的随机变量,他有概率密度

Pr(X=ai)=pi\text{Pr}(X=a_{i}) = p_{i}

Thm 9.7.1:

H(p1,)12log(2πe)(i=1pii2(i=1ipi)2+112)H(p_{1},\dots) \leq \frac{1}{2} \log (2\pi e)\left( \sum_{i=1}^\infty p_{i} i^2-\left( \sum_{i=1}^\infty ip_{i} \right)^2 + \frac{1}{12} \right)

for every permutation σ\sigma,

H(p1,)12log(2πe)(i=1pσ(i)i2(i=1ipσ(i))2+112)H(p_{1},\dots) \leq \frac{1}{2} \log (2\pi e)\left( \sum_{i=1}^\infty p_{\sigma(i)} i^2-\left( \sum_{i=1}^\infty ip_{\sigma(i)} \right)^2 + \frac{1}{12} \right)

证明:定义两个随机变量,第一个X0X_{0}:

Pr(X0=i)=pi\text{Pr}(X_{0}=i)=p_{i}

第二个UU[0,1][0,1]上的均匀分布,和X0X_{0}之间独立。X~=X0+U\tilde X=X_{0}+U

H(X0)=i=1pilogpi=i=1(ii+1fX~(x)dx)log(i=1i+1fX~(x)dx)=i=1ii+1fX~(x)logfX~(x)dx=1fX~(x)logf(XX~(x))dx=h(X)\begin{align} H(X_{0}) & = -\sum_{i=1}^\infty p_{i}\log p_{i} \\ & = -\sum_{i=1}^\infty\left( \int _{i}^{i+1} f_{\tilde X }(x) \, dx \right) \log \left( \int _{i=1}^{i+1} f_{\tilde X}(x) \, dx \right) \\ & = -\sum_{i=1}^\infty \int _{i}^{i+1} f_{\tilde X}(x)\log f_{\tilde X}(x) \, dx \\ & = -\int _{1}^\infty f_{\tilde X}(x)\log f(X_{\tilde X}(x)) \, dx \\ & = h(X) \end{align}

Hence we have the following chain of inequalities:

H(X)=H(X0)=h(X~)12log(2πe)Var(X~)=12log(2πe)(Var(X0)+Var(U))=12log(2πe)(i=1pii2(i=1ipi)2+112)\begin{equation}\begin{aligned} H(X) & = H(X_{0}) =h(\tilde X) \\ & \leq \frac{1}{2 }\log (2\pi e)\text{Var} (\tilde X) \\ & = \frac{1}{2} \log (2\pi e) (Var(X_{0} )+Var(U)) \\ & = \frac{1}{2} \log (2\pi e)\left( \sum_{i=1}^{\infty} p_{i}i^2 -\left( \sum_{i=1}^\infty ip_{i} \right)^2 + \frac{1}{12} \right) \end{aligned}\end{equation}

Leave a Comment

captcha
Fontsize